{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/oughtinc/ergo/blob/master/notebooks/el-paso-workflow.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# What is Ergo?\n",
    "We believe that many of the pieces for good forecasting work exist, but are difficult to connect in a productive workflow. We think that building better infrastructure for forecasting can help create more scalable, high quality forecasting. \n",
    "\n",
    "Ergo provides part of that infrastructure. It's a python library for integrating model-based and judgmental forecasting. Currently, it provides an easy way to create and combine probability distributions, as well as predictions from community predictions sites like Metaculus, to help produce more accurate judgments. \n",
    "\n",
    "To showcase this, we've put together an example use case for predicting max ventilator use in El Paso below. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "!pip install --progress-bar off poetry\n",
    "!pip install --progress-bar off git+https://github.com/oughtinc/ergo.git@e9f6463de652f4e15d7c9ab0bb9f067fef8847f7"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "import ssl\n",
    "warnings.filterwarnings(action=\"ignore\", category=FutureWarning)\n",
    "warnings.filterwarnings(action=\"ignore\", module=\"plotnine\")\n",
    "ssl._create_default_https_context = ssl._create_unverified_context"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ergo\n",
    "import seaborn\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from datetime import timedelta, date\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "pd.set_option('precision', 2)\n",
    "\n",
    "def summarize_samples(samples):\n",
    "    stats = samples.describe(percentiles=[0.05, 0.5, 0.95])\n",
    "    percentile = lambda pt: float(stats.loc[f\"{pt}%\"])\n",
    "    return f\"{percentile(50):.2f} ({percentile(5):.2f} to {percentile(95):.2f})\"\n",
    "\n",
    "def show_marginal(func):\n",
    "    samples = ergo.run(func, num_samples=1000)[\"output\"]\n",
    "    seaborn.distplot(samples).set_title(func.__doc__);\n",
    "    plt.show()\n",
    "    print(f\"Median {func.__doc__}: {summarize_samples(samples)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Choose a decision-relevant question"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How many ventilators will be needed in El Paso?\n",
    "\n",
    "I want to predict the answer to this question:\n",
    "\n",
    "> In El Paso, what is the maximum number of ventilators needed for COVID patients on a given day in May?\n",
    "\n",
    "In other words: on the day when most ventilators are needed, how many will be needed? For this tutorial, I'll pretend that we're making the prediction at the end of April -- I'll only use information that we could have had at that time.\n",
    "\n",
    "Background:\n",
    "\n",
    "- El Paso is a county in Texas (population 850,000) that is dealing with the impacts of COVID-19.\n",
    "- The community prediction site [Metaculus](https://www.metaculus.com) partnered with administrators at Texas Tech University Health Sciences Center to predict on questions that will guide decision making there.\n",
    "- This question is particularly decision-relevant: it's crucial to have enough ventilators for the patients who need them.\n",
    "\n",
    "## Loading question data from the Metaculus crowd prediction platform\n",
    "\n",
    "Ergo can load questions and make predictions on Metaculus and [Foretold](https://www.foretold.io/), another community prediction site.\n",
    "\n",
    "I'll load [the relevant question](https://pandemic.metaculus.com/questions/4204)* from Metaculus:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "metaculus = ergo.Metaculus(username=\"oughtpublic\", password=\"123456\", api_domain=\"pandemic\")\n",
    "ventilators_question = metaculus.get_question(4201, name=\"# ventilators needed\")\n",
    "ventilators_question"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\**Note: We're not asking exactly the same question as the Metaculus question. Most importantly, the Metaculus question is asking for the number of ventilators needed on a \"peak hospitalizations\" day that's defined in a complex way. To make things simpler, \n",
    "we're just asking how many ventilators are needed on the day when the most ventilators are needed.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A guess"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I'll start by making an uninformed guess. I'll do this before I look at any data or anyone else's predictions. That way, I can look back to this guess to see what I learn as I go through the forecasting process."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is my 90% confidence interval for the number of ventilators needed? I think there is a \n",
    "\n",
    "- less than 5% chance that <3 people will require ventilation\n",
    "- less than 5% chance that >100 people will require ventilation\n",
    "\n",
    "So I'll go with a 90% confidence interval of [3, 100].\n",
    "\n",
    "I don't know much about El Paso, ICUs, or ventilators. I feel pretty uncertain and assign non-negligible probability to higher values. I'll go with a [lognormal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution), which will assign more probability to the \"long tail\" of higher values than would a normal distribution."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distributions in Ergo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In Ergo, I can generate a single sample from this distribution like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ergo.lognormal_from_interval(3, 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get many samples, I use `ergo.run`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def ventilators_needed():\n",
    "    \"# ventilators needed\"\n",
    "    return ergo.lognormal_from_interval(3, 100)\n",
    "\n",
    "samples = ergo.run(ventilators_needed, num_samples=1000)\n",
    "\n",
    "samples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Why sample instead of manipulating distributions directly? To use distributions in our prediction, we need to be able to combine distributions for sub-predictions together. Combining distributions can be difficult, as it requires non-trivial math. \n",
    "\n",
    "Ergo allows you to create samplers (like the `ergo.lognormal_from_interval` above), which you can compose together using regular python operators (`*, +, -`). This allows you to focus on how you're modeling the problem, rather than the math. I'll build a complex model for this problem by composising samplers together."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualizing distributions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I can visualize these samples using my question's `show_prediction` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ventilators_question.show_prediction(samples[\"output\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model V1: My guess"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I'll wrap my guess in a `Model` class so that I can more easily build on it step by step by inheriting from the class, adding more methods, and replacing methods with better implementations. This also lets me see how my prediction changes as I make it more sophisticated. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ModelV1:\n",
    "    \n",
    "    def ventilators_needed(self):\n",
    "        \"# ventilators needed\"\n",
    "        return ergo.lognormal_from_interval(3, 100)\n",
    "\n",
    "    def run(self):\n",
    "        samples = ergo.run(self.ventilators_needed, num_samples=1000)[\"output\"]\n",
    "        ventilators_question.show_prediction(samples); plt.show()\n",
    "        print(f\"Median estimate of # ventilators needed: {summarize_samples(samples)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This doesn't change the result:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_v1 = ModelV1()\n",
    "model_v1.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's still an uninformed guess."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model V2: Decompose ventilators needed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## My decomposition\n",
    "\n",
    "My strategy for this question will be to break it into pieces and then improve my estimate for each piece.\n",
    "\n",
    "To estimate the maximum number of ventilators needed, I'll multiply the answers to:\n",
    "\n",
    "1. What's the maximum number of patients who will be in the ICU at once?\n",
    "2. What fraction of those patients will be on ventilators?\n",
    "\n",
    "So my decomposition looks like this:\n",
    "\n",
    "- **\\# ventilators needed** =\n",
    "  - max # icu patients *\n",
    "  - % of icu patients ventilated\n",
    " \n",
    "For both of these I'll make pretty uninformed guesses for now:\n",
    "\n",
    "1. I guess that about 5 to 200 people will be the maximum number of people in the ICU (but I have no idea)\n",
    "2. I'd guess that 1 out of 3 will need ventilators (another uninformed guess)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ICU cases\n",
    "\n",
    "I'll use the same `lognormal_from_interval` function I used above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def max_icu_patients():\n",
    "    \"max # icu patients\"\n",
    "    return ergo.lognormal_from_interval(5, 200)\n",
    "\n",
    "show_marginal(max_icu_patients)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Proportion of ICU patients on ventilators\n",
    "\n",
    "I'll use a beta-binomial distribution via `ergo.beta_from_hits`.\n",
    "\n",
    "`ergo.beta_from_hits`:\n",
    "1. Assumes that there's some constant underlying probability that a condition is met in a situation (in this case, the condition is that a patient will be on a ventilator given that they're in the ICU)\n",
    "2. Takes in a number of observations of the situation (patients in the ICU) and the number of observations that met the condition (number of patients in the ICU who needed a ventilator)\n",
    "3. Returns a sample from the beta-binomial distribution of underlying probabilities that the condition is met in the situation (again, the probability that a patient will be on a ventilator given that they're in the ICU)\n",
    "\n",
    "[This explanation](http://varianceexplained.org/statistics/beta_distribution_and_baseball/) helped me understand it.\n",
    "\n",
    "Here, I'll say that there were 3 patients in the ICU and 1 of them was ventilated.\n",
    "\n",
    "So, I think that about 1/3 of patients in the ICU are ventilated, but since I only have 3 total observations, I'm not at all confident and the true proportion could be much higher or lower."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def frac_icu_ventilated():\n",
    "    \"% of icu patients ventilated\"\n",
    "    return ergo.beta_from_hits(1, 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This distribution looks like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show_marginal(frac_icu_ventilated)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## My new model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "My model now looks like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ModelV2(ModelV1):\n",
    "    \n",
    "    def ventilators_needed(self):\n",
    "        \"# ventilators needed\"\n",
    "        return self.frac_icu_ventilated() * self.max_icu_patients()   \n",
    "    \n",
    "    def max_icu_patients(self):\n",
    "        \"max # of icu patients\"\n",
    "        return ergo.lognormal_from_interval(5, 200)\n",
    "\n",
    "    def frac_icu_ventilated(self):\n",
    "        \"% of icu patients ventilated\"\n",
    "        return ergo.beta_from_hits(1, 3)\n",
    "    \n",
    "    def run(self):\n",
    "        # we want to be able to compare versions of the model as we change it, so we use some python \n",
    "        # metaprogramming to compute the prediction from our previous model.\n",
    "        prev_model_samples = ergo.run(self.__class__.__bases__[0]().ventilators_needed, num_samples=1000)[\"output\"]\n",
    "        \n",
    "        # we then plot them against one another, showing how the model has changed\n",
    "        this_model_samples = ergo.run(self.ventilators_needed, num_samples=1000)[\"output\"]\n",
    "        samples_df = pd.DataFrame(data={\"Previous model\": prev_model_samples, \"This model\": this_model_samples})\n",
    "        ventilators_question.show_prediction(samples_df); plt.show()\n",
    "        print(f\"Median estimate of # ventilators needed: {summarize_samples(this_model_samples)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_v2 = ModelV2()\n",
    "model_v2.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This model's prediction is about the same as the previous model's."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model V3: Use external model for # of ICU patients"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So far, I've just made an uninformed guess about the maximum number of patients in the ICU.\n",
    "\n",
    "I could do more modeling myself to get a more accurate number, but it seems better to just use [this model](https://github.com/shaman-lab/COVID-19Projection)\n",
    " ([paper](https://www.medrxiv.org/content/10.1101/2020.03.21.20040303v2)) from the Shaman lab at Columbia university.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\**Note: the Columbia model directly makes predictions for how many ventilators will be needed, so it would be most sensible to just use those numbers directly. But, that wouldn't make for a very useful or informative tutorial. So instead, I'll pretend that the ventilator predictions don't exist, and I'll make my own based on the Columbia prediction for number of people in the ICU.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our team made an interface to the Columbia projections, which I'll use here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ergo.contrib.el_paso import shaman\n",
    "\n",
    "cu_projections = shaman.load_cu_projections(\"El Paso County TX\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## My new model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now I can change my model to use the Columbia projections rather than my guess from before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ModelV3(ModelV2):\n",
    "    def max_icu_patients(self):\n",
    "        \"max # of icu patients\"\n",
    "        return max(shaman.cu_projections_for_dates(\"ICU\", date(2020,5,1), date(2020,6,1), cu_projections))\n",
    "\n",
    "model_v3 = ModelV3()\n",
    "model_v3.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Why did my model results change?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "My new model projects many more ventilators needed. I can see that this is because the Columbia model has a much fatter tail for the max number of ICU patients than my previous guess had:\n",
    "\n",
    "#### 1. My previous guess for max ICU patients"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show_marginal(model_v2.max_icu_patients)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. The Columbia model prediction for max ICU patients"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show_marginal(model_v3.max_icu_patients)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model V4: Decompose fraction of ICU patients ventilated"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I want to improve my estimate for the fraction of ICU patients ventilated:\n",
    "\n",
    "- \\# ventilators needed =\n",
    "  - max # icu patients *\n",
    "  - **% of icu patients ventilated**\n",
    "  \n",
    "I can break it down as:\n",
    "- **% of icu patients ventilated** =\n",
    "  - \\% icu patients that need ventilation at some point while in the ICU *\n",
    "  - icu-ventilation adjustment (see below)\n",
    "  \n",
    "I'll skip guessing these values and just go ahead and use data to model them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fraction of ICU patients that need ventilation at some point during their ICU stay\n",
    "\n",
    "I'll model this based on the data that [PabloStafforini et al gathered on Metaculus](https://pandemic.metaculus.com/questions/4154/#comment-28155)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ventilated_data = [\n",
    "    {\n",
    "        \"study\": \"Wuhan 1\",\n",
    "        \"source_url\": \"https://www.thelancet.com/pdfs/journals/lanres/PIIS2213-2600(20)30110-7.pdf\",\n",
    "        \"icu\": 52,\n",
    "        \"ventilated\": 22\n",
    "    },\n",
    "    {\n",
    "        \"study\": \"Wuhan 2\",\n",
    "        \"source_url\": \"http://doi.org/10.1001/jama.2020.1585\",\n",
    "        \"icu\": 36,\n",
    "        \"ventilated\": 17\n",
    "    },\n",
    "    {\n",
    "        \"study\": \"Lombardy\",\n",
    "        \"source_url\": \"http://doi.org/10.1001/jama.2020.5394\",\n",
    "        \"icu\": 1300,\n",
    "        \"ventilated\": 1150\n",
    "    },\n",
    "    {\n",
    "        \"study\": \"UK\",\n",
    "        \"source_url\": \"https://doi.org/10.1136/bmj.m1201\",\n",
    "        \"icu\": 196,\n",
    "        \"ventilated\": 132\n",
    "    },\n",
    "]\n",
    "\n",
    "ventilated_df = pd.DataFrame(ventilated_data)\n",
    "\n",
    "ventilated_df[\"%_ventilated\"] = ventilated_df[\"ventilated\"] / ventilated_df[\"icu\"]\n",
    "\n",
    "ventilated_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I'll use `ergo.beta_from_hits` again to model the fraction of ICU patients that we expect to need ventilation.\n",
    "\n",
    "Naively, we'd just add up all of the ICU patients across all of the studies and all of the ventilated patients across all of the studies and input them to `ergo.beta_from_hits`.\n",
    "\n",
    "However, recall from above that `ergo.beta_from_hits` assumes that all of the samples provided to it are from the same situation.\n",
    "\n",
    "In reality, the situation differs between the 4 studies and between the studies and El Paso, due to differences in how COVID works in different locations, differences in measurement and record-keeping, etc.\n",
    "\n",
    "So if we just added up the ICU and ventilated patients and input them into `ergo.beta_from_hits`, we'd be way too confident in our answer.\n",
    "\n",
    "Instead, I'll do a bit of somewhat-unprincipled math to:\n",
    "1. Gain some uncertainty\n",
    "2. Instead of weighing every study participant equally (and thus weighing large studies overwhelmingly more than small studies), weigh larger studies somewhat more"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ventilated_df[\"log_participants\"] = np.log(ventilated_df[\"icu\"])\n",
    "ventilated_df[\"weight\"] = ventilated_df[\"log_participants\"] / sum(ventilated_df[\"log_participants\"])\n",
    "ventilated_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "weighted_avg_percent_ventilated = sum(ventilated_df[\"%_ventilated\"] * ventilated_df[\"weight\"])\n",
    "weighted_avg_percent_ventilated"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def icu_receive_ventilation():\n",
    "    \"% icu patients that need ventilation at some point while in the ICU\"\n",
    "    # as explained above, this sets our level of uncertainty\n",
    "    total_participants = 200\n",
    "    return ergo.beta_from_hits(total_participants * weighted_avg_percent_ventilated, total_participants)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ICU-ventilation adjustment\n",
    "\n",
    "### Why is this necessary?\n",
    "\n",
    "It might seem that I could just multiply `(max # icu patients) * (% icu patients that need ventilation at some point while in the ICU)` to get `ventilators needed`, but I cannot.\n",
    "\n",
    "To see this, first imagine that:\n",
    "1. There are 100 patients in the ICU\n",
    "2. 1/2 of ICU patients need ventilation at some point during their stay\n",
    "\n",
    "Then `(max # icu patients) * (% icu patients that need ventilation at some point while in the ICU)` = 50. This may give too high an estimate of ventilators needed. \n",
    "\n",
    "Imagine that:\n",
    "1. Patients who will need ventilation at some point during their ICU stay remain in the ICU for the same total amount of time as ICU patients who will never need ventilation\n",
    "2. ICU patients only need ventilation for 1/2 of their ICU stay\n",
    "\n",
    "In this scenario, in fact only 1/4 * 100 = 25 ICU patients will need ventilation at any one time.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\**Note: Alternatively, `(max # icu patients) * (% icu patients that need ventilation at some point while in the ICU)` may give an underestimate. Imagine that:*\n",
    "1. *Patients who will need ventilation at some point during their ICU stay remain in the ICU 10 times as long as ICU patients who will not never need ventilation*\n",
    "2. *All ICU patients who need ventilation need it for their entire stay*\n",
    "\n",
    "*The math is a bit more complicated and I haven't done it, but hopefully it's clear that in this scenario, almost all 100 of the patients will need ventilation at any given time.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Making the adjustment\n",
    "\n",
    "My model needs some way to adjust for this complication. My guess is that this parameter won't actually have a huge impact, so I'll just guess that it's fairly close to 1:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def icu_ventilation_adjustment(self):\n",
    "    return ergo.lognormal_from_interval(0.5, 1.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## My new model\n",
    "\n",
    "### New decomposition\n",
    "My new overall decomposition is:\n",
    "- \\# ventilators needed =\n",
    "  - max # icu patients *\n",
    "  - % of icu patients ventilated\n",
    "      - \\% icu patients that need ventilation at some point while in the ICU *\n",
    "      - icu-ventilation adjustment\n",
    "\n",
    "Updating my model with the new decomposition and submodels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ModelV4(ModelV3):\n",
    "    def icu_receive_ventilation(self):\n",
    "        \"% icu patients that need ventilation at some point while in the ICU\"\n",
    "        # as explained above, this sets our level of uncertainty\n",
    "        total_participants = 200\n",
    "        return ergo.beta_from_hits(total_participants * weighted_avg_percent_ventilated, total_participants)\n",
    "    \n",
    "    def icu_ventilation_adjustment(self):\n",
    "        \"adjustment to the % of icu patients ventilated at some point to get the % currently ventilated\"\n",
    "        return ergo.lognormal_from_interval(0.5, 1.5)\n",
    "        \n",
    "    def frac_icu_ventilated(self):\n",
    "        \"% of icu patients ventilated\"\n",
    "        return self.icu_receive_ventilation() * self.icu_ventilation_adjustment()\n",
    "\n",
    "model_v4 = ModelV4()\n",
    "model_v4.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This doesn't change my prediction that much."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model V5: Ensemble with community prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the previous model, I used a model based on data to predict the proportion of ICU patients that need ventilation at some point during their ICU stay.\n",
    "\n",
    "However, there's a Metaculus question for [what proportion of hospital patients will be admitted to the ICU](https://pandemic.metaculus.com/questions/4155/what-portion-of-in-hospital-cases-in-el-paso-county-will-require-admission-to-the-icu/), and another one for [what proportion of hospital patients will require ventilation](https://pandemic.metaculus.com/questions/4154/what-portion-of-in-hospital-patients-with-covid-19-in-el-paso-county-will-require-invasive-ventilation/).\n",
    "\n",
    "If I assume that all ventilated patients are in the ICU, I can do some simple math to get a Metaculus estimate for proportion of ICU patients ventilated:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "icu_admit_per_hospital_admit_question = metaculus.get_question(4155)\n",
    "ventilation_per_hospital_admit = metaculus.get_question(4154)\n",
    "\n",
    "def metaculus_icu_receive_ventilation():\n",
    "    \"\"\"\n",
    "    The Metaculus community's implicit prediction for \n",
    "    % icu patients that need ventilation at some point while in the ICU\n",
    "    \"\"\"\n",
    "    icu_admit_per_hospital_admit_sample = icu_admit_per_hospital_admit_question.sample_community()\n",
    "    ventilation_per_hospital_admit_sample = ventilation_per_hospital_admit.sample_community()\n",
    "    \n",
    "    # each of these should be a positive number,\n",
    "    # but we're sampling from a distribution that includes 0\n",
    "    # and negative numbers. So clip them at just above 0.\n",
    "    icu_clipped = max(icu_admit_per_hospital_admit_sample, 0.01)\n",
    "    ventilated_clipped = max(ventilation_per_hospital_admit_sample, 0.01)\n",
    "    \n",
    "    return ventilated_clipped / icu_clipped"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comparing to my model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_samples = 100\n",
    "metaculus_samples = [metaculus_icu_receive_ventilation() for _ in range(num_samples)]\n",
    "seaborn.distplot(metaculus_samples)\n",
    "\n",
    "model_samples = ergo.run(model_v4.icu_receive_ventilation, num_samples=num_samples)\n",
    "seaborn.distplot(model_samples).set_title(model_v4.icu_receive_ventilation.__doc__)\n",
    "plt.legend(labels=[\"Metaculus\", \"My model\"])\n",
    "plt.show();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The implicit community prediction is similar to my model prediction, but much less confident.\n",
    "\n",
    "## My new model - now with ensembling!\n",
    "I can combine my prediction and the community prediction into an ensembled prediction. I'll sample from the Metaculus community prediction half the time, and from my model half the time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ModelV5(ModelV4):\n",
    "    def my_model_icu_receive_ventilation(self):\n",
    "        \"% icu patients that need ventilation at some point while in the ICU\"\n",
    "        ventilation_pseudocounts = 25 + 17 + 0.05 * 1150 + 0.1 * 132\n",
    "        icu_pseudocounts = 100 + 36 + 0.05 * 1300 + 0.1 * 196\n",
    "        return ergo.beta_from_hits(ventilation_pseudocounts, icu_pseudocounts)\n",
    "    \n",
    "    def metaculus_icu_receive_ventilation(self):\n",
    "        \"\"\"\n",
    "        The Metaculus community's implicit prediction for \n",
    "        % icu patients that need ventilation at some point while in the ICU\n",
    "        \"\"\"\n",
    "        icu_admit_per_hospital_admit_sample = icu_admit_per_hospital_admit_question.sample_community()\n",
    "        ventilation_per_hospital_admit_sample = ventilation_per_hospital_admit.sample_community()\n",
    "\n",
    "        icu_clipped = max(icu_admit_per_hospital_admit_sample, 0.01)\n",
    "        ventilated_clipped = max(ventilation_per_hospital_admit_sample, 0.01)\n",
    "\n",
    "        return ventilated_clipped / icu_clipped\n",
    "    \n",
    "    def icu_receive_ventilation(self):\n",
    "        \"% icu patients that need ventilation at some point while in the ICU\"\n",
    "        # use the community prediction half the time, my model half the time \n",
    "        if ergo.flip(0.5):\n",
    "            return self.metaculus_icu_receive_ventilation()\n",
    "        else:\n",
    "            return self.my_model_icu_receive_ventilation()\n",
    "\n",
    "model_v5 = ModelV5()\n",
    "model_v5.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Not a big change, though it is apparent that with the less-confident community prediction added in, the model is now less confident that El Paso will need so many ventilators."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Marginals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is my last model, so let's also take a look at all the predictions for its marginals (the steps in the decomposition):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show_marginal(model_v5.max_icu_patients)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show_marginal(model_v5.frac_icu_ventilated)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show_marginal(model_v5.icu_receive_ventilation)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show_marginal(model_v5.icu_ventilation_adjustment)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Review"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "My original question was:\n",
    "\n",
    "> How many ventilators will be needed for COVID patients in El Paso county in May?\n",
    "\n",
    "We broke it down as follows:\n",
    "\n",
    "- **\\# ventilators needed [Result: 44.82 (7.12 to 608.06)]** =\n",
    "  - max # icu patients [Result: 106.50 (22.00 to 1174.00). Source: [Columbia model](https://github.com/shaman-lab/COVID-19Projection)] *\n",
    "  - % of icu patients ventilated [Result: 0.43 (0.16 to 1.02)] =\n",
    "      - \\% icu patients that need ventilation at some point while in the ICU [Result: 0.51 (0.22 to 1.03). Source: combination of [PabloStafforini et al.](https://pandemic.metaculus.com/questions/4154/what-portion-of-in-hospital-patients-with-covid-19-in-el-paso-county-will-require-invasive-ventilation/#comment-28155)] and Metaculus community prediction*\n",
    "      - icu-ventilation adjustment [Result: 0.88 (0.51 to 1.52). Source: guess.]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercises for the reader\n",
    "\n",
    "Some questions to help you think through and extend my model:\n",
    "\n",
    "1. What did I change about the model at each step and how did that affect my prediction? Do you think the change in my prediction is likely to be an improvement?\n",
    "\n",
    "2. How could I decompose the question further or use better data/submodels to make a better prediction?\n",
    "  \n",
    "  a. How could I make a data/model based prediction for `icu-ventilation adjustment` instead of making a guess?\n",
    "\n",
    "  b. How could I make better use of the Metaculus community predictions to estimate `% icu patients that need ventilation at some point while in the ICU`? Does it make sense to sample from each of the 2 Metaculus questions independently of the other?\n",
    "\n",
    "3. What assumptions does the model make that might be unwarranted?\n",
    "  \n",
    "  a. What does the model assume about the relationship between the number of patients in the ICU on a given day, and the proportion of those patients who need to be ventilated?\n",
    "\n",
    "4. The final model is currently extremely uncertain.\n",
    "  \n",
    "  a. Why?\n",
    "\n",
    "  b. Is that uncertainty warranted?\n",
    "\n",
    "  c. How could you reduce it in a reasonable way?"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "main_language": "python",
   "notebook_metadata_filter": "-all"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
